-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graph Library #1521
base: master
Are you sure you want to change the base?
Graph Library #1521
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 41 files reviewed, 1 unresolved discussion (waiting on @lockshaw)
lib/utils/src/utils/graph/series_parallel/series_parallel_decomposition.cc
line 82 at r2 (raw file):
} bool is_empty(Node const &node) {
borrowed from #1562, will fix them there if needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 23 of 32 files at r1, 12 of 16 files at r2, 7 of 9 files at r3, all commit messages.
Reviewable status: 42 of 47 files reviewed, 35 unresolved discussions (waiting on @Marsella8)
lib/utils/test/src/utils/graph/undirected/algorithms/get_connected_components.cc
line 10 at r3 (raw file):
using namespace FlexFlow; TEST_CASE("get_connected_components") {
Suggestion:
TEST_SUITE(FF_TEST_SUITE) {
TEST_CASE("get_connected_components") {
lib/utils/test/src/utils/graph/undirected/algorithms/get_connected_components.cc
line 41 at r3 (raw file):
} SUBCASE("3 components") {
Also test the case where everything is one component
Also test the case where the graph is empty
lib/utils/test/src/utils/graph/digraph/algorithms/get_dominators.cc
line 34 at r3 (raw file):
std::unordered_set<Node> correct = {n.at(0)}; CHECK(correct == result); }
Suggestion:
SUBCASE("small acyclic graph") {
std::vector<Node> n = add_nodes(g, 4);
std::vector<DirectedEdge> e = {
DirectedEdge{n.at(0), n.at(3)},
DirectedEdge{n.at(0), n.at(1)},
DirectedEdge{n.at(0), n.at(2)},
DirectedEdge{n.at(1), n.at(2)},
};
add_edges(g, e);
SUBCASE("get_dominators(..., Node const &)") {
Node node = n.at(2);
std::unordered_set<Node> correct = {n.at(0), n.at(2)};
std::unordered_set<Node> result = get_dominators(g, node);
CHECK(correct == result);
}
SUBCASE("get_dominators(..., std::unordered_set<Node> const &)") {
std::unordered_set<Node> nodes = {n.at(1), n.at(3)};
std::unordered_set<Node> result = get_dominators(g, nodes);
std::unordered_set<Node> correct = {n.at(0)};
CHECK(correct == result);
}
}
lib/utils/include/utils/graph/series_parallel/series_parallel_decomposition.h
line 20 at r3 (raw file):
std::unordered_multiset<Node> get_nodes(Node const &); bool is_empty(Node const &node);
How would you end up with an empty SP decomposition? That doesn't seem like it should be allowed (i.e., might be a good idea to change the container in the splits to be some wrapper type of std::vector
that requires itself to be non-empty)
lib/utils/include/utils/graph/series_parallel/series_parallel_decomposition.h
line 28 at r3 (raw file):
SeriesParallelDecomposition delete_node(SeriesParallelDecomposition sp, Node const &node);
How is this a well-behaved operation? Like, what are the semantics of this? Are you sure we want this function as part of the API for SeriesParallelDecomposition
?
Code quote:
SeriesParallelDecomposition delete_node(SeriesParallelDecomposition sp,
Node const &node);
lib/utils/include/utils/graph/series_parallel/series_parallel_decomposition.h
line 30 at r3 (raw file):
Node const &node); // duplicate nodes within `sp` are counted multiple times
Doxygen style please 🙂
Code quote:
// duplicate nodes within `sp` are counted multiple times
lib/utils/include/utils/graph/series_parallel/series_parallel_decomposition.h
line 33 at r3 (raw file):
size_t num_nodes(SeriesParallelDecomposition const &sp); SeriesParallelDecomposition serial_composition(
For consistency, never use "serial" in the FF codebase, always use "series"
Suggestion:
SeriesParallelDecomposition series_composition(
lib/utils/include/utils/graph/series_parallel/series_reduction.h
line 19 at r3 (raw file):
std::unordered_set<std::vector<MultiDiEdge>> find_all_extended_series_reductions(MultiDiGraphView const &g);
What is an "extended series reduction"? Correct me if I'm wrong here but I don't think this is standard terminology(?) in which case it would be good to have a doxygen docstring here with some explanation
Also, it might be nice to create an actual datatype (e.g., ExtendedSeriesReduction
) that is a wrapper type around std::vector<MultiDiEdge>
rather than just using a raw vector.
Code quote:
find_all_extended_series_reductions(MultiDiGraphView const &g);
lib/utils/test/src/utils/graph/undirected/undirected.cc
line 15 at r3 (raw file):
TEST_SUITE(FF_TEST_SUITE) { TEST_CASE_TEMPLATE(
Why are there two tests here?
lib/utils/test/src/utils/graph/undirected/undirected.cc
line 18 at r3 (raw file):
"UndirectedGraph implementations", T, HashmapUndirectedGraph) { RC_SUBCASE("Full", [&]() {
Why is this using rapidcheck?
lib/utils/src/utils/graph/views/views.cc
line 2 at r3 (raw file):
#include "utils/graph/views/views.h" #include "utils/bidict/algorithms/right_entries.h"
I'm not seeing `right_entries used anywhere here in new code. Was this already necessary and just missing, or can this be removed?
lib/utils/src/utils/graph/views/views.cc
line 70 at r3 (raw file):
UndirectedEdge to_undirected_edge(DirectedEdge const &e) { return UndirectedEdge{{e.src, e.dst}};
Does commutative_pair
not have an explicit
constructor?
Code quote:
{e.src, e.dst}
lib/utils/src/utils/graph/views/views.cc
line 122 at r3 (raw file):
std::unordered_set<UndirectedEdge> undirected_edges = set_union(g.query_edges(UndirectedEdgeQuery{q.srcs}), g.query_edges(UndirectedEdgeQuery{q.dsts}));
Can't you just union the query_sets (i.e., q.srcs
and q.dsts
) and then do the query, rather than unioning the query results?
Code quote:
std::unordered_set<UndirectedEdge> undirected_edges =
set_union(g.query_edges(UndirectedEdgeQuery{q.srcs}),
g.query_edges(UndirectedEdgeQuery{q.dsts}));
lib/utils/test/src/utils/graph/digraph/algorithms/traversal.cc
line 12 at r3 (raw file):
TEST_SUITE(FF_TEST_SUITE) { TEST_CASE("get_unchecked_dfs_ordering") {
IIRC the semantics of unchecked_dfs_ordering
are essentially that it is free to visit nodes multiple times (i.e., in a diamond graph of 4 nodes (a -> b, a -> c, b -> d, c -> d), it would either yield [a, b, d, c, d] or [a, c, d, b, d]--it would be good to add a testcase for this
lib/utils/test/src/utils/graph/digraph/algorithms/traversal.cc
line 24 at r3 (raw file):
std::vector<Node> result = get_unchecked_dfs_ordering(g, {n[0]}); CHECK(correct == result); }
Suggestion:
SUBCASE("simple path") {
std::vector<Node> correct = {n[0], n[1], n[2], n[3]};
std::vector<Node> result = get_unchecked_dfs_ordering(g, {n[0]});
CHECK(correct == result);
}
SUBCASE("start from non-initial node") {
std::vector<Node> correct = {n[1], n[2], n[3]};
std::vector<Node> result = get_unchecked_dfs_ordering(g, {n[1]});
CHECK(correct == result);
}
lib/utils/test/src/utils/graph/digraph/algorithms/algorithms.cc
line 11 at r3 (raw file):
TEST_SUITE(FF_TEST_SUITE) { TEST_CASE("DiGraph - algorithms.cc") {
Prefer one TEST_CASE per function, rather than one SUBCASE per function
lib/utils/test/src/utils/graph/digraph/algorithms/algorithms.cc
line 16 at r3 (raw file):
std::vector<Node> n = add_nodes(g, 4); std::vector<DirectedEdge> e = { DirectedEdge{n[0], n[1]},
Prefer .at
for bounds checking
Suggestion:
DirectedEdge{n.at(0), n.at(1)},
lib/utils/test/src/utils/graph/digraph/algorithms/algorithms.cc
line 55 at r3 (raw file):
} SUBCASE("get_sinks") {
Potentially rename to get_terminal_nodes
/get_initial_nodes
following https://reviewable.io/reviews/flexflow/flexflow-train/1565#-OGCWYwtE_rcz9e1WizH
lib/utils/test/src/utils/containers/contains.cc
line 19 at r3 (raw file):
std::unordered_set<int> s = {1, 2, 3, 4, 5}; CHECK(contains(s, 3)); CHECK(!contains(s, 6));
Doesn't doctest require this to use CHECK_FALSE
? iirc doctest has an issue with CHECK(!...)
Code quote:
CHECK(!contains(s, 6));
lib/utils/test/src/utils/graph/views/views.cc
line 49 at r3 (raw file):
CHECK(result == expected); } }
Probably the best way to test these (due to the complexity of the querying interface) would be to use rapidcheck to generate a bunch of operations and then ensure the behaviors are the same between the view and the non-view graph that would be generated. This is going to be annoying to write, but without restructuring things this is probably the correct approach. The good thing is that the equivalence testing code only needs to be written once per graphview type and then can be reused for each of the views.
Code quote:
TEST_CASE("UndirectedSubgraphView") {
UndirectedGraph g = UndirectedGraph::create<HashmapUndirectedGraph>();
std::vector<Node> n = add_nodes(g, 5);
add_edges(g,
{UndirectedEdge{{n.at(0), n.at(3)}},
UndirectedEdge{{n.at(1), n.at(1)}},
UndirectedEdge{{n.at(1), n.at(2)}},
UndirectedEdge{{n.at(1), n.at(3)}},
UndirectedEdge{{n.at(2), n.at(3)}},
UndirectedEdge{{n.at(2), n.at(4)}}});
std::unordered_set<Node> sub_nodes = {n.at(0), n.at(1), n.at(3)};
UndirectedGraphView view = view_subgraph(g, sub_nodes);
SUBCASE("get_nodes") {
std::unordered_set<Node> expected = {n.at(0), n.at(1), n.at(3)};
std::unordered_set<Node> result = get_nodes(view);
CHECK(result == expected);
}
SUBCASE("get_edges") {
std::unordered_set<UndirectedEdge> expected = {
UndirectedEdge{{n.at(0), n.at(3)}},
UndirectedEdge{{n.at(1), n.at(1)}},
UndirectedEdge{{n.at(1), n.at(3)}},
};
std::unordered_set<UndirectedEdge> result = get_edges(view);
CHECK(result == expected);
}
}
lib/utils/include/utils/graph/series_parallel/parallel_reduction.h
line 15 at r3 (raw file):
find_parallel_reduction(MultiDiGraphView const &); std::unordered_map<DirectedEdge, std::unordered_set<MultiDiEdge>>
What is an "extended parallel reduction"? Correct me if I'm wrong here but I don't think this is standard terminology(?) in which case it would be good to have a doxygen docstring here with some explanation
Also, it might be nice to create an actual datatype (e.g., ExtendedParallelReduction
) that is a wrapper type around std::unordered_set<MultiDiEdge>
rather than just using a raw unordered set
If this is just for accelerating the SP decomposition algorithm's rewriting step, it might be better to put these functions (and the corresponding ones for "extended series reductions" in that file rather than in the general "parallel_reduction.h" header)
lib/utils/test/src/utils/graph/multidigraph/algorithms/get_outgoing_edges.cc
line 35 at r3 (raw file):
SUBCASE("node has no outgoing edges") { std::unordered_set<MultiDiEdge> result = get_outgoing_edges(g, n.at(2));
Would be nice for this node to have some incoming edges to ensure they're not accidentally picked up
lib/utils/test/src/utils/graph/multidigraph/algorithms/get_incoming_edges.cc
line 31 at r3 (raw file):
SUBCASE("node has no incoming edges") { std::unordered_set<MultiDiEdge> result = get_incoming_edges(g, n.at(2));
Would be nice for this node to have some outgoing edges to ensure they're not accidentally picked up
lib/utils/include/utils/graph/node/node.struct.toml
line 9 at r3 (raw file):
"fmt", "json", "rapidcheck",
How are you using a default Arbitrary instance for Node
? I have no issue adding generators for Node
, but the default instance seems unlikely to be helpful as Node
is only meaningful in the context of a graph?
lib/utils/test/src/utils/graph/series_parallel/series_reduction.cc
line 263 at r3 (raw file):
find_all_extended_series_reductions(g); std::unordered_set<std::vector<MultiDiEdge>> correct = { {e[0], e[1], e[2]}};
Prefer .at
for bounds checking
Suggestion:
{e.at(0), e.at(1), e.at(2)}};
lib/utils/test/src/utils/graph/series_parallel/series_reduction.cc
line 267 at r3 (raw file):
} SUBCASE("2 linear strands") {
Suggestion:
SUBCASE("2 linear strands that share a common terminal node") {
lib/utils/test/src/utils/graph/series_parallel/series_reduction.cc
line 277 at r3 (raw file):
std::unordered_set<std::vector<MultiDiEdge>> result = find_all_extended_series_reductions(g); std::unordered_set<std::vector<MultiDiEdge>> correct = {{e[0], e[2]},
Prefer .at
for bounds checking
lib/utils/test/src/utils/graph/series_parallel/series_reduction.cc
line 300 at r3 (raw file):
find_all_extended_series_reductions(g); std::unordered_set<std::vector<MultiDiEdge>> correct = { {e[0], e[2], e[7]}, {e[3], e[6]}, {e[5], e[9]}};
Prefer .at
for bounds checking
lib/utils/test/src/utils/graph/series_parallel/series_reduction.cc
line 372 at r3 (raw file):
std::unordered_set<MultiDiEdge> result_edges = get_edges(g); std::unordered_set<MultiDiEdge> correct_edges = [&] { std::unordered_set<MultiDiEdge> new_edges = unordered_set_of(e);
Why not just set_minus
here?
lib/utils/include/utils/graph/multidigraph/algorithms/get_outgoing_edges.h
line 12 at r3 (raw file):
std::unordered_map<Node, std::unordered_set<MultiDiEdge>> get_outgoing_edges(MultiDiGraphView const &g);
Suggestion:
std::unordered_map<Node, std::unordered_set<MultiDiEdge>>
get_outgoing_edges_map(MultiDiGraphView const &g);
lib/utils/include/utils/graph/multidigraph/algorithms/get_incoming_edges.h
line 12 at r3 (raw file):
std::unordered_map<Node, std::unordered_set<MultiDiEdge>> get_incoming_edges(MultiDiGraphView const &g);
Suggestion:
std::unordered_map<Node, std::unordered_set<MultiDiEdge>>
get_incoming_edges_map(MultiDiGraphView const &g);
lib/utils/test/src/utils/graph/digraph/algorithms/directed_edge_query.cc
line 27 at r3 (raw file):
DirectedEdgeQuery result = directed_edge_query_all(); CHECK(matches_edge(result, DirectedEdge{n.at(0), n.at(1)}));
Do you really need 5 checks here? It feels like the behavior of directed_edge_query_all
could have been tested on a much smaller graph (or even without a graph at all, using rapidcheck). This feels like unnecessary reusing of g
, which is why I recommend doing one TEST_CASE
per function instead of one SUBCASE
per function unless you have very good reason.
In fact, I don't see how g
is necessary for any of the cases in this file?
lib/utils/test/src/utils/graph/digraph/algorithms/directed_edge_query.cc
line 38 at r3 (raw file):
DirectedEdgeQuery{query_set{n.at(0)}, query_set{n.at(1)}}; CHECK(matches_edge(q, DirectedEdge{n.at(0), n.at(1)}));
Check that it doesn't match reversed edges
Code quote:
CHECK(matches_edge(q, DirectedEdge{n.at(0), n.at(1)}));
lib/utils/test/src/utils/graph/digraph/algorithms/directed_edge_query.cc
line 57 at r3 (raw file):
CHECK(result == correct); } SUBCASE("intersection with std::nullopt") {
Suggestion:
SUBCASE("intersection with matchall") {
lib/utils/src/utils/graph/series_parallel/series_reduction.cc
line 79 at r3 (raw file):
} return unordered_set_of(values(strands)); }
Newlines please--this is really hard to read
Code quote:
std::unordered_set<std::vector<MultiDiEdge>>
find_all_extended_series_reductions(MultiDiGraphView const &g) {
std::unordered_map<Node, std::unordered_set<MultiDiEdge>> incoming_edges =
get_incoming_edges(g);
std::unordered_map<Node, std::unordered_set<MultiDiEdge>> outgoing_edges =
get_outgoing_edges(g);
std::unordered_map<Node, std::vector<MultiDiEdge>> strands;
std::unordered_map<Node, Node> node_to_head_of_strand;
for (Node const &n : get_topological_ordering(g)) {
if ((incoming_edges.at(n).size() == 1) &&
(outgoing_edges.at(n).size() == 1)) {
MultiDiEdge incoming = get_only(incoming_edges.at(n));
MultiDiEdge outgoing = get_only(outgoing_edges.at(n));
Node pre = g.get_multidiedge_src(incoming);
if (contains_key(node_to_head_of_strand, pre)) {
Node head = node_to_head_of_strand.at(pre);
node_to_head_of_strand.emplace(n, head);
strands.at(head).push_back(outgoing);
} else {
node_to_head_of_strand.emplace(n, n);
strands[n].push_back(incoming);
strands[n].push_back(outgoing);
}
}
}
return unordered_set_of(values(strands));
}
Description of changes:
Extending testing functionality for graph library
Bug fixing
General clean up
Related Issues:
Linked Issues:
Issues closed by this PR:
This change is